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(54) A design environment and a design method for hardware/software co-design 



(57) A design environment and a design method for 
implementing a heterogeneous essentially digital sys- 
tem is disclosed. The design environment comprises: 

a database compiled on a computer, adapted for ac- 
cess by executable programs on said computer for gen- 
erating the implementation of said heterogeneous es- 
sentially digital system, comprising a plurality of objects 
representing aspects of said digital system wherein said 
objects comprise primitive objects representing the 
specification of said digital system and hierarchical ob- 
jects being created by said executable programs while 
generating the implementation of said digital system, 
said hierarchical objects being refinements of said prim- 
itive objects and having more detail and preserving any 
one or all of said aspects to thereby generate said im- 
plementation of said digital system; and further compris- 
ing relations inbetween said primitive objects and inbe- 
tween said hierarchical objects and between said primi- 
tive objects and said hierarchical objects; and further 
comprising functions for manipulating said objects and 
said relations; means for specifying said heterogeneous 
digital system comprising a plurality of behavioral and 
structural languages; means for simulating said hetero- 
geneous digital system comprising a plurality of simula- 
tors for said behavioral and structural languages; means 
for implementing said heterogeneous digital system 
comprising a plurality of compilers for said behavioral 
and structural languages; means for allocating hardware 
components for an implementation of said heterogene- 



ous digital system; means for assigning hardware sub- 
systems and software subsystems of said heterogene- 
ous digital system to said hardware components; means 
for implementing the communication between said soft- 
ware subsystems and said hardware subsystems, one 
of the aspects of said communication being represented 
by ports; means for encapsulating said simulators, said 
compilers, said hardware components, said hardware 
subsystems and said software subsystems whereby cre- 
ating a consistent communication between said encap- 
sulated simulators, compilers, hardware components, 
hardware subsystems and software subsystems. 

Furthermore, a method is disclosed for making an 
implementation of an heterogeneous essentially digital 
system, comprising the steps of: defining a first set of 
primitive objects representing the specification of said 
digital system, comprising the steps of: describing the 
specification of said system in a plurality of processes, 
each process representing a functional aspect of said 
system, said processes being primitive objects; defining 
ports and connecting said ports with channels, said 
ports structuring the communication between said proc- 
esses, said ports and said channels being primitive ob- 
jects, one proces s having one or more ports; defining 
the communication semantics of said ports by a proto- 
col, said protocol being a primitive object; and thereafter 
creating hierarchical objects being refinements of said 
primitive objects and having more detail, while preserv- 
ing aspects of said communication semantics. 
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Description 

Field of the Invention 

The present invention relates to a design environ- 
ment and a design method for hardware/software co- 
design. More specifically the hardware/software co-de- 
sign of this invention comprises the specification, syn- 
thesis, and simulation of heterogeneous systems. 

Background of the invention 

Digital communication techniques form the basis of 
the rapid breakthrough of modern consumer electron- 
ics, wireless and wired voice- and data networking prod- 
ucts, broadband networks and multi-media applications. 
Such products are based on digital communication sys- 
tems, which are made possible by the combination of 
VLSI technology and Digital Signal Processing. 

Digital systems perform real-time transformations 
on time discrete digitized samples of analogue quanti- 
ties with finite bandwidth and signal to noise ratio. These 
transformations can be specified in programming lan- 
guages and executed on a programmable processor or 
directly on application specific hardware. The choice is 
determined by trade-offs between cost, performance, 
power and flexibility. 

Hence digital systems are a candidate par excellence 
for hardware-software co-design. 

In contrast to analogue processing, digital process- 
ing guarantees perfect reproducibility, storage and test- 
ability. Signal quality is a matter of exact mathematical 
operations. The price paid is the cost of hardware and 
the performance needed to satisfy the hard real-time 
character. This problem is now solved by the abundance 
of digital VLSI (Very Large Scale Integration) technology 
which provides for cheap storage and high speed com- 
putation. Therefore, the combination of VLSI technology 
and digital processing has made possible the break- 
through of modern consumer electronics, portable and 
personal communication, broadband networks, multi- 
media, and automotive applications. 

The design process of the products for these appli- 
cations is subject to a number of constraints. A first con- 
straint is that they must be implemented in silicon or an- 
other hardware platform for power, performance and 
cost reasons. A second constraint is that these products 
implement systems conceived by a highly specialized 
system team thinking in terms of executable concurrent 
programming paradigms which, today, are not well un- 
derstood by hardware designers. Hence most specifica- 
tions are first translated into English and then rede- 
signed in a specific hardware description language such 
as VHDL or VERILOG for the hardware components 
and a software description language such as C or as- 
sembler for the software components. Although the 
hardware and software have tight interaction, both hard- 
ware and software are designed separately Only after 



system assembly, the software and hardware are run 
together. As a consequence, the design can be far from 
optimal or even erroneous, making a redesign cycle 
mandatory. This gap between system design and imple- 

s mentation is rapidly becoming the most important bot- 
tleneck in the design process of said products or said 
systems. Another constraint is that for reasons of cost- 
effectiveness and time-to-market, there is a need to in- 
crease design productivity by at least an order of mag- 

10 nitude. Yet another constraint is that re-use of designs 
as well as a design for re-use methodology will have to 
be adopted. Said methodology implies hardware/soft- 
ware co-design at several levels of implementation. 
J. Buck et al. in "PTOLEMY: A framework for simu- 

*s lating and prototyping heterogeneous systems' (Inter- 
national Journal on Computer Simulation, January 
1994) focus on an environment for hardware/software 
co-simulation. The proposed methodology does only al- 
low for hardware/software co-design of systems based 

20 on a Data-Flow algorithm. Furthermore hardware/soft- 
ware interface synthesis is not supported. 

United States Patent No. 5,197,016 discloses a 
computer-aided system and method for designing an 
application specific integrated circuit whose intended 

25 function is implemented both by hardware subsystems 
and software subsystems. The proposed methodology 
only allows for a single processor design and is only val- 
id for specifications based on a state transition diagram. 
The hardware/software co-design of systems based on 

30 a heterogeneous specification is not supported. 

S. Narayan, F. Vahid, and D. Gajski. in "System 
specification with the SpecCharts language" (IEEE De- 
sign & Test of Computers, pages 6-13, December 
1 992) disclose a methodology that builds on VHDL. The 

35 methodology does not support the hardware/software 
co-design of systems based on a heterogeneous spec- 
ification. 

P. Chou, R. Ortega, and G. Borriello. in "Synthesis 
of the hardware/software interlace in microcontroller- 
40 based systems" (Proceedings of the IEEE International 
Conference on Computer-Aided Design, ICCAD 92, 
pages 48B - 495, November 1992) show a method for 
hardware/software interface generation for microcon- 
troller based systems. Said method assumes that the 
45 user determines the software interfacing such as the 
communication with drivers before the start of the sys- 
tem synthesis task. 

Neither of the prior art solutions provides a design 
environment based on a data-model that allows to spec- 
ie ify, simulate and implement or synthesize heterogene- 
ous hardware/software implementations starting from a 
heterogeneous system specification. In the following 
paragraphs of this section an analysis is made of the 
characteristics of specifications of such heterogeneous 
55 systems. 
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Problem Definition 

in the strictest sense digital systems are algorithms 
mapping digital signals into digital signals in real-time. 
The real-time constraint is determined by the repetition s 
period of the algorithm for consuming an input frame and 
producing a new output frame. 

The periodicity of this constraint and the nature of 
the signals leads to the fact that the elementary algo- 
rithm is a data-flow function. 70 

A Synchronous Data-Flow (SDF) algorithm can be 
modeled as an acyclic graph where nodes are operators 
and edges represent data-precedences. This graph is 
called a data-flow graph. An operator can execute when 
a predetermined, fixed number of data (tokens) are *s 
present at its input and then it produces one or more 
output data (tokens). 

Conditional selection of two operations to a single output 
is allowed. Operators have no state but arrays resulting 
from one algorithm execution can be saved for reuse in 20 
future executions (delay operator). Many digital 
processing algorithms are of this type. They can be de- 
scribed very efficiently by so called applicative program- 
ming languages like SILAGE. 

In contrast, dynamic data-flow (DDF) algorithms 25 
contain data-dependent token production and con- 
sumption. They allow for while and if-then-else con- 
structs. 

Computer-Aided Design (CAD) environments for 
digital systems such as DSP-Station of Mentor Graph- 30 
ics, PTOLEMY, GRAPE-II of COSSAP all allowfor spec- 
ification of SDF and DDF and use as much as possible 
static scheduling to provide simulation speeds that are 
up to two orders of magnitude faster than event driven 
simulators such as in use for VHDL. This justifies the 35 
use of these simulation paradigms for digital system 
specification and validation. 

However, when we consider digital processing sys- 
tems in the broad sense, a wider scope is necessary as 
illustrated in FIGURE 1 which is an abstraction of many 40 
practical implementations of digital processing systems. 
A careful look at FIGURE 1 allows us to identify Five (in 
the sequel 1)2)3) 4) 5) ) common characteristics of dig- 
ital processing system specifications: 

45 

1) Digital systems typically comprise one (or more) 
signal paths 1 as well as slow control loops 2 and a 
reactive control system 3 taking events 4 of a slow 
environment such as a user interface (Ul) 5 and 
slow status information 6 of the signal paths as in- so 
puts to control the mode or parameters of the signal 
paths. 

2) A signal path 1 is usually a concatenation of data- 
flow functional (DFB) blocks 7, such as hi, h2, 

L2, often operating at fairly different data- and exe- ss 
cution-rates and transforming the format of the da- 
ta. The rate and format differences naturally result 
from operations such as: frequency down- or up- 



conversion, bit to symbol modulation, data-com- 
pression and error correction coding. When these 
DFBs operate on unfragmented signal words, they 
can best be specified as data-flow algorithms (e.g. 
in SILAGE, DFL t or C). Others that manipulate in- 
dividual bits of the signals can be directly specified 
as Finite State Machines with Data paths (FSMD) 
at VHDL register transfer or behavioral level. Hence 
the specification format depends on the type of da- 
ta-flow functional block. 

3) DFBs in the signal path are internally strongly in- 
terconnected data-flow graphs with sparse external 
communication. Hence, from an implementation 
viewpoint, they are seldom partitioned over several 
hardware or software components. Rather they will 
be merged onto the same component if throughput 
and rate constraints allow to do so. Merging implies 
sequentializing the concurrent processes on a sin- 
gle component while still satisfying the timing con- 
straints. This requires software synthesis : encap- 
sulation techniques of single thread compilers in or- 
der to allow real-time scheduling of concurrent proc- 
esses. 

4) Control loops and mode control by parameter 
setting are common to almost alt digital processing 
systems. For example, all digital communication 
systems have tracking and acquisition loops, in or- 
der to synchronise frequency and phase of the re- 
ceiver signal path to the characteristics of the in- 
coming signal. Design of these loops is one of the 
most difficult tasks since their characteristics de- 
pend strongly on noise and distortion properties of 
the communication channel. It involves the design 
of phase-locked loops, delay-locked loops, and fast 
Fourier transforms, controlled by "events" disturb- 
ing the regularity of the signal streams. 

The occurrence rate of these events is orders 
of magnitude slower than the data-rate in the signal 
path. Hence, similar to the Ul, the processes mod- 
eling these slow control loops and mode setting 
have no data-flow but reactive semantics. 

They run concurrently with the data-flow and of- 
ten consist themselves of concurrent processes. 
Such a control dominated system can be described 
as a Program State Machine (PSM), which is a hi- 
erarchy of program-states, in which each program- 
state represents a distinct mode of computation. 
Formalisms such as StateCharts or SpecCharts , 
which include behavioral hierarchy, exception han- 
dling and inter-process communication modeling 
are needed to describe such systems. In practice, 
very often synchronization is specified in one or 
more concurrent C programs. 

5) Digital systems contain both high and low data- 
rate blocks in the signal path. High data-rate blocks 
are synthesised directly in hardware. Low data-rate 
blocks are candidates for implementation on pro- 
grammable processors. Hence digital systems are 
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natural candidates for hardware/software co-de- 
sign. 

From the above it follows that digital systems re- 
quire a combination of data-models for their specifica- 
tion. Specification languages are tightly coupled to 
these data-models, paradigms, simulators, and synthe- 
sis tools. 

Nowadays, the dominant specification language of 
the digital system designer is C or a DFL for the main 
signal path whereas FSMDs and PSMs are usually de- 
scribed in a HDL. For the description of communication 
channels and communication protocols other formal- 
isms such as timing diagrams, Extended Signal Transi- 
tion Graphs, and Communicating Sequential Processes 
must be considered. A CAD system for digital systems 
must be able to encapsulate all these paradigms and 
there associated languages and design environments. 

Digital systems design thus requires the ability to 
mix data-flow and reactive paradigms with widely differ- 
ent time constants. The difference in time constants be- 
tween control-and data-flow poses special problems in 
simulation. It requires all processes to be simulatable at 
the highest possible abstraction level. 

Not only the specification of a digital system is het- 
erogeneous by nature. Also the implementation archi- 
tecture of a digital system is heterogeneous. An exam- 
ple implementation architecture comprises the following 
types of components and the communication between 
these components: 

• programmable processors. 

• application specific processors with hardwired con- 
troller. 

• application specific processors with specialized in- 
struction set. 

• hardware accelerators 

• micro controllers 

• communication blocks and memory 

• peripherals (DMA, UART, ...) 

Thus, a design method for a digital system must 
bridge the gap between the heterogeneous specifica- 
tion of the system and its heterogeneous implementa- 
tion. Today's synthesis tools and compilers allow us to 
synthesize or program all the processor-accelerator- 
memory components once the global system architec- 
ture has been defined. However, the availability of these 
component compilers is necessary, but not sufficient. 
What is needed are the models and tools to refine the 
functional specification of a system into the detailed ar- 
chitecture: the definition and allocation of the compo- 
nents and their communication and synchronization. 
The most essential step is to generate the necessary 
software and hardware to make processors, accelera- 
tors, and the environment communicate. 

One of the keys to mastering the complexity of dig- 
ital system design is the reuse of components. The de- 



sign process for a digital system must allow the mode- 
ling of reusable components and support a design for 
reuse methodology which allows to design components 
that are easily reusable. The problem in reusing previ- 

s ously designed components lies in the fixed communi- 
cation protocols they use, which necessitates protocol 
conversions when processors with different protocols 
have to be interfaced. Nowadays, the selection of a pro- 
tocol is done while designing the component: functional 

io and communication behavior are intrinsically mixed. 
However, a good selection of the protocol is possible 
only when all components involved in the communica- 
tion are known. Therefore, a design environment for dig- 
ital systems has to allow that a component is initially de- 

75 scribed purely functional. Later, when the component is 
(re)used in a system, the design environment must allow 
to plug in the most appropriate communication behavior. 
This approach is in contrast with current hardware (VH- 
DL) design practices, where communication and func- 

20 tional behavior are mixed. 

Another key to mastering the complexity of digital 
system design is by means of modularity. In modular de- 
signs, the complete system functionality is split into 
communicating components of manageable complexity. 

25 The advantage of this approach is that the components 
can be reused and that the system is easier to adapt 
and maintain. 

The disadvantage is the overhead because of the 
inter-component communication or because the compil- 

30 er does not optimize over the component boundaries. 
Therefore, the inter-component communication seman- 
tics should be such that modularity can be removed eas- 
ily when merging two components into a single compo- 
nent. 

35 in the past, a lot of effort has been put in design 
environments that allow to implement the components 
of a digital system. 

Languages with associated simulators, tuned to- 
wards specific application domains, allow to specify and 

40 simulate components at a high abstraction level. Hard- 
ware compilers can implement the component descrip- 
tion into processors with highly specialized architec- 
tures. Software compilers allow to generate machine 
code for off-the-shelf programmable processors. In- 

^5 struction set simulators allow to debug the machine 
code at different levels of abstraction (C, asm). Exam- 
ples of such design environments are Cathedral-1/2/3, 
the ARM processor tool suite (C-compiler and the AR- 
Mulator), and the Synopsys synthesis tools. From the 

50 above it can be concluded that the components of digital 
systems can be implemented with off-the-shelf design 
environments. What is missing is the glue that links 
these design environments together and automatically 
interfaces the generated or off-the-shelf processors ac- 

55 cording to the system specification. Hence, a system de- 
sign environment should allow to include existing design 
environments easily. It should provide synthesis tools 
for hardware/hardware and hardware/software interfac- 
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ing that are processor and design environment inde- 
pendent. To achieve this, the specification method must 
allow to model off-the-shelf components on an as-is ba- 
sis. 

In summary, the following requirements can be de- 5 
fined for a hardware/software system design environ- 
ment. 

• Modularity being essential to master complexity, but 

the overhead should be minimal and removable. 10 

• Different description languages are needed to allow 
each system component to be described with the 
most appropriate paradigm. 

• The design environment must be able to model the 
heterogeneous conceptual specification, the result- 
ing heterogeneous architecture and all refinement 
steps in between. 

• Off-the-shelf components and the associated de- 
sign environments need to be modeled. 

• A clear separation between functional and commu- 
nication behavior is required to allow to reuse de- 
signs. 

• Processor independent interface synthesis is es- 
sential. 

Summary of the invention 

A design methodology and a design environment 
meeting above-stated requirements for a hardware/soft- 
ware system co-design environment is disclosed in the 
present application. A hardware/software co-design en- 
vironment and design methodology based on a data- 
model that allows to specify, simulate, and synthesize 
heterogeneous hardware/software architectures from a 
heterogeneous specification is disclosed. Said environ- 
ment and said methodology are based on the principle 
of encapsulation of existing hardware and software 
compilers and allow for the interactive synthesis of hard- 
ware/software and hardware/hardware interlaces. 

It is a first object of the present invention to disclose 
a design environment for implementing a heterogene- 
ous essentially digital system. Said system comprises: 
a database compiled on a memory structure 
adapted for access by executable programs on a com- 
puter for generating the implementation of said hetero- 
geneous essentially digital system, comprising a plural- 
ity of objects representing aspects of said digital system 
wherein said objects comprise primitive objects repre- 
senting the specification of said digital system and hier- 
archical objects being created by said executable pro- 
grams while generating the implementation of said dig- 
ital system, said hierarchical objects being refinements 
of said primitive objects and having more detail and pre- 
serving any one or all of said aspects to thereby gener- 
ate said implementation of said digital system ; and fur- 
ther comprising relations inbetween said primitive ob- 
jects and inbetween said hierarchical objects and be- 
tween said primitive objects and said hierarchical 



objects ; and further comprising functions for manipulat- 
ing said objects and said relations. Said system further 
comprises means for specifying said heterogeneous 
digital system comprising a plurality of behavioral and 
structural languages; means for simulating said hetero- 
geneous digital system comprising a plurality of simula- 
tors for said behavioral and structural languages; means 
for implementing said heterogeneous digital system 
comprising a plurality of compilers for said behavioral 
and structural languages; means for allocating hard- 
ware components for an implementation of said heter- 
ogeneous digital system; means for assigning hardware 
subsystems and software subsystems of said heteroge- 
neous digital system to said hardware components; 
means for implementing the communication between 
said software subsystems and said hardware subsys- 
tems, one of the aspects of said communication being 
represented by ports; means for encapsulating said sim- 
ulators, said compilers, said hardware components, 
said hardware subsystems and said software subsys- 
tems whereby creating a consistent communication be- 
tween said encapsulated simulators, compilers, hard- 
ware components, hardware subsystems and software 
subsystems; and means for creating processor models 
of said hardware components as objects in said data- 
base, said models comprising software models repre- 
senting the software views on said hardware compo- 
nents and hardware models representing the hardware 
views on said hardware components. 

In an aspect of the present invention, the design en- 
vironment further comprises means for creating I/O sce- 
nario models of said ports as objects in said database, 
said I/O scenario models representing the implementa- 
tion of said ports on said hardware components, said 
implementation comprising software subsystems, hard- 
ware subsystems, and processor models with connec- 
tions therebetween. 

In another aspect of the present invention, the im- 
plementation of the communication between a first soft- 
ware subsystem and a first hardware subsystem in the 
design environment results in said first software subsys- 
tem with a first port being replaced by a second hard- 
ware subsystem with a second port, said first port and 
said second port representing an essentially identical 
communication. 

Yet in another aspect of the present invention, fur- 
ther comprise means for selecting I/O scenario models 
for the ports of said first software subsystem; means for 
combining the software subsystems of said selected 1/ 
O scenarios; means for combining the hardware sub- 
systems of said selected I/O scenarios. 

In the design environment, a first I/O scenario mod- 
el can represent the connection of said first port to said 
second port, said connection comprising a connection 
of said first port to said software subsystems of said 1/ 
O scenario model, connections of said software subsys- 
tems of said I/O scenario model to said software model, 
connections of said hardware model to said hardware 
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subsystems of said I/O scenario model, and a connec- 
tion of said hardware subsystems of said I/O scenario 
model to said second port. 

In the design environment, the I/O scenario models 
can further comprise memory mapped I/O scenarios, in- 
struction programmed I/O scenarios, and interrupt 
based I/O scenarios. 

In an aspect of the present invention, said imple- 
mentation is a simulation ol said digital system. Said 
simulation can be a multi-abstraction level simulation, 
said multi-abstraction level simulation comprising sub- 
stantially simultaneous low-level and high-level simula- 
tion. 

Said simulation can be a multi-platform simulation 
being executed on a plurality of computers. 

Said simulation can be a hybrid simulation compris- 
ing substantially simultaneous hardware implementa- 
tions and computer simulations. 

Said implementation can be a heterogeneous im- 
plementation comprising hardware subsystems and 
software subsystems, said software subsystems being 
executed on one or more of said hardware subsystems. 

Said hardware subsystems can comprise any one 
or more of processor cores, off-the-shelf components, 
custom components, ASICs, processors, or boards. 

Said software subsystems can comprise machine 
instructions for said hardware subsystems. 

It a second object of the present invention to dis- 
close a method of making an implementation of a het- 
erogeneous essentially digital system, comprising the 
steps of: 

defining a first set of primitive objects representing 
the specification of said digital system, comprising 
the steps of : 

describing the specification of said system in a plu- 
rality of processes, each process representing a 
functional aspect of said system, said processes 
being primitive objects; 

defining ports and connecting said ports with chan- 
nels, said ports structuring the communication be- 
tween said processes, said ports and said channels 
being primitive objects, one process having one or 
more ports; 

defining the communication semantics of said ports 
by a protocol, said protocol being a primitive object ; 
and thereafter 

creating hierarchical objects being refinements of 
said primitive objects and having more detail, while 
preserving aspects of said communication seman- 
tics; allocating one or more hardware components, 
said components comprising programmable proc- 
essors and non-programmable processors; assign- 
ing said processes to said hardware components, 
the processes being assigned to a programmable 
processor being a software subsystem, the other 
processes being hardware subsystems; and select- 
ing I/O scenario models for the ports of said soft- 



ware subsystem thereby connecting said ports to 
the interface of said programmable processor and 
connecting the interface of said programmable 
processor to second ports, said second ports rep- 
5 resenting an essentially identical communication as 
said ports. 

The method can further comprise the step of simu- 
lating said system. 
10 Said implementation comprises hardware and soft- 
ware subsystems of said system, said software subsys- 
tems being executed on one or more of said hardware 
subsystems. 

The method can comprise the step of generating a 
is netlist comprising the layout information of said imple- 
mentation. 

Said hardware subsystems can comprise any one 
or more of processor cores, off-the-shelf components, 
custom components, ASICs, processors, or boards. 

20 in an aspect of the present invention, the method 
can further comprise the step of refining the channel in- 
between a first and a second port of respectively a first 
and a second hardware component, said first and said 
second port having an incompatible protocol, thereby 

25 creating a hierarchical channel, said hierarchical chan- 
nel converting the first protocol into the second protocol. 

The method can comprise the step of refining the 
channels inbetween incompatible ports of hardware 
components, thereby creating hierarchical channels, 

30 and the step of generating a netlist comprising the layout 
information of said implementation. 

Brief description of the drawings 

35 FIGURE 1 is a schematic representation of a het- 
erogeneous digital system comprising various specifi- 
cation paradigms. 

FIGURE 2 is a flowchart representing the method 
for hardware/software co-design of the present inven- 

40 tion. 

FIGURE 3 is an illustration of the primitive objects 
in the database and the relations in between said prim- 
itive objects. 

FIGURE 4 is an illustration of the hierarchical ob- 
45 jects in the database and the relations in between said 
hierarchical objects and between the primitive and hier- 
archical objects. 

FIGURE 5 is an illustration of the process merge 
transformation. 
50 FIGURE 6 is a flowchart of a specific embodiment 
of the implementation process for hardware/software 
co-design. 

FIGURE 7 is a schematic representation of the 
functionality of the hardware/software interface genera- 
55 tion. 

FIGURE 8 is a schematic representation of a par- 
ticular I/O scenario modeled in the database. 

FIGURE 9 is a flowchart of a specific embodiment 
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of the construction of a hardware/software co-simula- 
tion. 

FIGURE 10 is a block diagram of a typical hetero- 
geneous digital system: the pager application. 

FIGURE 11 is a schematic representation of the 5 
pager application as described with the present inven- 
tion. 

FIGURE 12 is a block diagram of the pager after 
application of the process merge transformation. 

FIGURE 1 3 is a block diagram of the pager after the 
communication channels have been tagged with a spe- 
cific communication behavior. 

FIGURE 14 is an illustration of the introduction of 
specific communication behavior in the pager applica- 
tion by refining a primitive channel into a hierarchical 
channel. 

FIGURE 15 is an illustration of the implementation 
of a process in hardware, whereby the resulting hard- 
ware subsystems are encapsulated to make them com- 
municate. 

FIGURE 16 shows the details of the encapsulation 
of the hardware subsystems in this particular applica- 
tion. 

FIGURE 17 is an illustration of the generation of a 
hardware/software interface between a software sub- 
system compiled on an ARM processor core and a hard- 
ware subsystem. 

Detailed description of the invention 

In the sequel a design environment and a design 
methodology meeting the requirements of modularity, 
encapsulation of different description languages, mod- 
eling from a heterogeneous conceptual specification to 
a resulting heterogeneous architecture and all refine- 
ment steps inbetween, modeling capabilities for off-the- 
shelf components and the associated design environ- 
ments, separation between functional and communica- 
tion behavior and processor independent interface syn- 
thesis is disclosed. Said design environment in the se- 
quel is called CoWare. 

In the sequel it is to be understood that the concept 
refinement means converting or translating or trans- 
forming a specification of an electronic system into an 
implementation. Said implementation can be an archi- 
tecture of components that has the same behavior as 
said specification or that executes said specification. 
Said implementation can also be an added development 
in the chain leading to a final implementation. Adding 
detail means that an implementation is made more spe- 
cific or concrete as a result of an implementation deci- 
sion on a previous level in the chain leading to a final 
implementation. To detail can also mean adding a ma- 
terial object such as a specific component or a specific 
communication inbetween components, as opposed to 
an abstract aspect on a previous level in the chain lead- 
ing to a final implementation. Other instances of the con- 
cept refinement and of the concept detail are to be found 



in the sequel. 

FIGURE 2 shows the architecture of the CoWare 
system. 

The CoWare system supports four major design ac- 
tivities: co-specification 8, co-simulation 9, co-synthesis 
10 and interface synthesis 11. The input is a heteroge- 
neous specification of an electronic system, the output 
12 is a netlist for prior-art commercial tools for the gen- 
eration of the implementation layout. Said output pref- 
erably comprises structural VHDL or Verilog and ma- 
chine code for the programmable processors. 

The CoWare design environment is implemented 
on top of a data model in which modularity is provided 
by means of processes. Processes contain host lan- 
guage encapsulations which are used to describe the 
system components. Communication between process- 
es takes place through a behavioral interface compris- 
ing ports. For two processes to be able to communicate, 
their ports must be connected with a channel. The inter- 
process communication semantics is based on the con- 
cept of the Remote Procedure Call (RPC). The data 
model is hierarchically structured and allows to refine 
channels, ports, and protocols into lower level objects, 
adding detail. We refer to the most abstract object as a 
primitive object. An object that contains more implemen- 
tation detail, is referred to as a hierarchical object. 

We first discuss the primitive objects. The hierarchi- 
cal objects are used to refine the communication behav- 
ior of the system and are discussed afterwards. 

A process is a container for a number of host lan- 
guage encapsulations of a component. A single process 
can have multiple host language encapsulations de- 
scribing different implementations for the same compo- 
nent, or for the same component represented at differ- 
ent abstraction levels. 

A host language encapsulation describes a compo- 
nent in a specific host language. Preferably C t C++, 
DFL, VHDL and Verilog are supported host languages. 
A CoWare language encapsulation is used to describe 
the system's structure. In a CoWare language encapsu- 
lation, one can instantiate processes and connect their 
ports with channels. 

Other host language encapsulations comprise con- 
text and a number of threads. The context and thread 
contain code written in the host language of the encap- 
sulation. The context contains code that is common to 
all threads in the encapsulation, i.e. variables/signals 
and functions as allowed by the semantics of the host 
language. As such the context provides for inter-thread 
(intra-process) communication. 

Each primitive CoWare process (symbolized by an 
ellipse 13 in FIGURE 2) encapsulates concurrent pro- 
gram threads in a host language of choice. Concurrent 
threads communicate over shared memory inside a 
process. Inter-process communication is over uni-direc- 
tional channels using a Remote Procedural Call (RPC) 
protocol. The reasons of this choice will be explained 
below. 
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Notice that in this way heterogeneous specification 
is supported: both hardware and software aspects, 
structural and behavioral aspects, and different specifi- 
cation paradigms (data-flow, control-flow, ...) can be 
combined. 5 

Co-specification allows to describe a functional 
specification based on the concept of communicating 
CoWare processes. 

An important concept of CoWare is that basically no 
distinction is made between co-simulation and co-syn- 
thesis. Both are based on the concept of refining the 
specification for implementation, re-using existing com- 
pilers, emulators, and simulation processes. 

In refinement for co-synthesis the designer per- 
forms an interactive coarse partitioning of the specifica- 
tions over a user allocated architecture. This leads to a 
merger of component compiler consistent processes to 
be mapped on the same component. Component com- 
piler consistent processes have an encapsulation in the 
same host language. Merging consists of in-lining the 
RPC calls between said processes and leads to two 
subproblems: the mapping of the concurrent threads in 
the processes on a processor re-using existing compo- 
nent compilers 14, 15, 16, 17, 18, 19 and the refinement 
of the communication between processes into hardware 
and software communication protocols that implements 
it. The implementation of concurrent threads and intra- 
process communication must be taken care of by using 
Real-Time Operating Systems (RTOS), micro-kernels 
or software synthesis in case of programmable proces- 
sors or by providing a library based communication pro- 
tocol shell around the existing hardware synthesis tools. 
Refinement of the inter-process communication means 
again a refinement of the primitive RPC communication 
by expanding the communication ports into implementa- 
ble protocols available in a protocol library 20. It is also 
possible to assign channel processes to abstract chan- 
nels. 

In principle all of this is open to the user who can 
add his own library tor communication protocols. On the 
other hand CoWare provides in the SYMPHONY tool- 
box 21 a methodology for interface synthesis whereby 
every communication channel is refined by selection of 
a communication scenario. In this way automated syn- 
thesis of hardware/hardware and hardware/software in- 
terfaces, including the generation of the software drivers 
in programmable processors is possible. This is an es- 
sential part of hardware/software co-design. 

After the compilation of all components, all hard- 
ware is available as structural VHDL and all software for 
the processors is in C which can be compiled on the 
host compiler of the programmable components. The fi- 
nal step is to link all the synthesis and hardware descrip- 
tions to drive commercial back-end tools to generate 
layout. 

In FIGURE 3, the processes system 22 and subsys- 
tem 23 contain a CoWare language encapsulation. The 
CoWare language encapsulation of system 22 de- 



scribes how it is built up from an instance of subsystem 
23 and an instance of P4 (24). The processes P1 (25), 
P23 (26), and P4 (24) each contain a C language en- 
capsulation. 

Ports are objects through which processes commu- 
nicate. A primitive port is characterized by a protocol and 
a data type parameter. 

There is one implicit port, the construct port, to 
which an RPC is performed exactly once at system start- 
up. 

In FIGURE 3 the process P23 (26) has two primitive 
ports p2 (27) and p3 (28), next to the implicit construct 
port. 

Protocols define the communication semantics of a 
port. A primitive protocol is one of master, in master, out- 
master, inoutmaster, slave, inslave, outslave, inout- 
slave. Each primitive protocol indicates another way of 
data transport. The in, out, and inout prefix indicates the 
direction of the data. The master, slave postfix indicates 
the direction of the control: whether the protocol acti- 
vates an RPC (master) or services an RPC (slave). In 
the remainder of this text, ports with a slave/master pro- 
tocol are usually referred to as slave/master ports. 

In FIGURE 3 master ports (29, 30) are represented 
by the small shaded rectangles on a process 1 perimeter. 
Slave ports (27, 28) are represented by small open rec- 
tangles on the perimeter. The data direction of a port is 
represented by the arrow that connects to a port. In FIG- 
URE 3 port p1 (29) is an outmaster port and port p2 (27) 
is an inslave port. 

A protocol may further have an index set. The indi- 
ces in the index set are used to convey extra information 
about the data that is transported. For example the prim- 
itive protocol used to model the memory port of a proc- 
essor will have an index to model the address of the data 
that is put on the memory port. 

A thread is a single flow of control within a process. 
A thread contains code in the host language of the en- 
capsulation of which the thread is a part. The code in a 
thread is executed according to the semantics of the 
host language. We distinguish between slave threads 
and autonomous threads. 

Slave threads are uniquely associated to slave 
ports and their code is executed when the slave port is 
activated (i.e. when an RPC is performed to the slave 
port). There is one special slave thread which is asso- 
ciated to the implicit construct port and can be used to 
initialize the process. 

In FIGURE 3 the process P23 (26) contains two reg- 
ular slave threads (31 , 32) associated to the slave ports 
p2 (27) and p3 (28) respectively, next to the special con- 
struct slave thread (33). 

Autonomous threads are not associated to any port 
and their code is executed, after system initialization, in 
an infinite loop. 

In FIGURE 3 processes P1 (25) and P4 (24) each 
contain an autonomous thread (34). 

A language encapsulation can contain multiple 
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slave and autonomous threads that, in principle, all ex- 
ecute concurrently. 

A channel is a point-to-point connection of a master 
port and a slave port. Two ports that are connected by 
a channel can exchange data. Channels can be uni- or 5 
bi-directional. A primitive channel provides for unbuff- 
ered communication. It has no behavior: it is a medium 
for data transport. In hardware it is implemented with 
wires. In software it is implemented with a (possibly in- 
lined) function call. In this way, primitive channels model 
the basic communication primitives found back in soft- 
ware and hardware description languages. 

In the strict sense only point-to-point channels con- 
necting one master to one slave port are allowed. How- 
ever, a person skilled in the art, can easily remove this 
restriction to allow channels connecting two master 
ports or two slave ports, or to allow channels connecting 
multiple slave and master ports. 

Such an extended description, can be transformed 
into the basic model, by using a default or user-defined 
translation scheme. 

In FIGURE 3, there is a primitive channel (35) that 
connects port p1 (29) of process P1 (25) with port p2 
(27) of process P23 (26). 

Communication always happens between two 
threads. Communication between threads that are part 
of the same process is denoted as intra-process com- 
munication. Communication between threads in differ- 
ent processes is denoted as inter-process communica- 
tion. 

Intra-process (inter-thread) communication is done 
by making use of shared variables/signals that are de- 
clared in the context of the process. Avoiding that two 
threads access the same variable at the same time is 
host language dependent. It is the user's responsibility 
to protect critical sections using the mechanisms pro- 
vided in the host language. 

In FIGURE 3, intra-process communication occurs 
in process P23 (26). 

The variable tmp (36) declared in the context (37) 
is shared by slave thread p2 (31 ) and slave thread p3 
(32). 

Inter-process (inter-thread) communication with a 
primitive protocol is RPC based. On a master port, the 
RPCfunction can be used to initiate a thread in a remote 
process. A master port can be accessed from anywhere 
in the host language encapsulation (context, autono- 
mous threads, slave threads) with the exception of the 
construct thread. 

The RPC function returns when the slave thread 
has completed, i.e. when all the statements in the slave 
thread's code are executed. In the slave thread (unique- 
ly associated with a slave port), the Read and Write 
functions can be used to access the data of the slave 
port. 

The Index function is used to access the indices of 
the protocol of the port. The RWbar function is used on 
an inoutslave port to determine the actual direction of 



the data transport. A slave port can only be accessed 
from within its associated slave thread. 

A bi-directional port can be used to both send and 
receive data. 

However, according to the strict RPC semantics this 
cannot be done by the same RPC call. In a single RPC 
call, one uses the bi-directional port either in the input 
or in the output direction but not in both directions. For 
a person skilled in the art, it is easy to extend the strict 
RPC semantics to full fledged function call semantics 
where arguments are passed to a remote procedure and 
results are received back. 

In FIGURE 3, inter-process communication occurs 
between processes P1 (25) and P23 (26) over the chan- 
nel (35). When the RPC statement (38)in the autono- 
mous thread (34) is reached, the value of the local var- 
iable data (39) is put on the channel (35). and the control 
is transferred to the slave thread p2 (31). The autono- 
mous thread (34) is halted, until the last statement of the 
slave thread (31 ) is executed. After that the autonomous 
thread (34) resumes by executing the statement (40) af- 
ter the RPC statement (38). 

By using primitive channels, ports, and protocols, 
the designer first concentrates on the functionality of the 
system while abstracting from terminals, signals, and 
handshakes. Once the designer is convinced that the 
processes of the system are functionally correct, the 
communication behavior of the system can be refined. 
Communication refinement in CoWare is carried out by 
making the objects involved in the communication 
(channels, ports, and protocols) hierarchical. 

Hierarchical channels are processes that assign a 
given communication behavior to a primitive channel. 
The behavioral interface of a hierarchical channel is 
fixed by the ports connected by the primitive channel. 
Making a channel hierarchical, can drastically change 
the communication behavior of two connected process- 
es. It can, for example, parallelize (pipeline) the proc- 
esses by adding buffers. The one property that is pre- 
served by making a channel hierarchical is the direction 
of the data transport. 

In FIGURE 4, the primitive channel (41) between 
processes P1 (42) and P23 (43) is refined into a hierar- 
chical channel (44) with FIFO behavior. The FIFO hier- 
archical channel (44) decouples the autonomous thread 
of process P1 (42) and the slave thread associated with 
port p2 (45) of process P2 (43). The effect is that the 
rate at which process P1 (42) can issue RPCs is no long- 
er determined by the rate at which process P23 (43) can 
service the RPCs. The FIFO hierarchical channel (44) 
takes care of the necessary buffering of data. 

Hierarchical ports are processes that assign a given 
communication behavior to a primitive port. The behav- 
ioral interface of the hierarchical port is partially fixed by 
the primitive port it refines. The hierarchical port process 
should have one port, which we call the return port, that 
is compatible with the primitive port. Making a primitive 
port hierarchical, preserves the data direction (in/out). 
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Two ports are compatible if their primitive protocols are 
compatible, if they have equal data type, and if they have 
equal protocol indices. The following primitive protocols 
are compatible : (master, slave) ; (inslave, outmaster) ; 
(inslave, inoutmaster) ; (outslave, inmaster) ; (outslave, s 
inoutmaster) ; (inoutslave, inmaster) ; (inoutslave, 
outmaster) ; (inoutslave, inoutmaster) . Two hierarchi- 
cal protocols are compatible if their primitive protocols 
are compatible and they have the same name. 

In FIGURE 4 we impose a certain data formatting 
for the data transported over the channel (41 ) between 
process P1 (42) and the FIFO hierarchical channel (44). 
This is achieved by making the primitive ports p1 (46) 
and left (47) hierarchical. The format process (48) that 
refines port pi (46) might for example add a cyclic re- 
dundancy check to the data that is transported. The un- 
format process (49) that refines port left (47) of the FIFO 
hierarchical channel (44) then uses this cyclic redun- 
dancy check to determine whether the received data is 
valid. The actual data and the cyclic redundancy check 
are sent sequentially over the same primitive channel 
between ports op (50) and tp (51). As a consequence, 
the data rate between the format (48) and unformat (49) 
process is twice the one of process P1 (42). 

Primitive protocols provide a classification of all hi- 
erarchical protocols. A primitive protocol determines the 
communication semantics, but not the communication 
implementation: it does not fix the timing diagram used 
in the communication. Hierarchical protocols refine 
primitive protocols with a timing diagram and the asso- 
ciated I/O terminals. Hierarchical protocols are high lev- 
el models for alternative implementations of a primitive 
protocol: they preserve both data direction and control 
direction of the primitive protocol. 

To access the terminals of the hierarchical protocol, 
a hierarchical port is introduced at the same time. The 
terminals can be accessed from within the thread code 
by using the functions Put, Sample, and Wait. 

In FIGURE 4, the primitive protocol of the port (50) 
and ip port (51 ) of the format (48) and unformat (49) 
process are refined into an RS232 protocol (52). In the 
RS232 hierarchical port (53), an RPC (54) issued in the 
format process (48) on the op port (50) is converted into 
manipu lations of the terminals (55) according to a timing 
diagram (56). 

The CoWare model is implemented on a computer 
or on a plurality of computers and a set of application 
programmer's interface (API) functions is available. 

When a CoWare system description is parsed, a 
representation of the system in memory is created in 
which the objects of the description are related to each 
other. All tools of the CoWare environment use these 
API functions to analyze, manipulate, and refine the sys- 
tem description. 

Due to the selection of RPC as inter-process com- 
munication, the classification of protocols and the struc- 
turing of a process in encapsulations with context and 
threads, a process merge transformation can be imple- 



mented. 

The goal of this transformation is to combine a 
number of process instances, that are described in the 
same host language, into a single host language encap- 
sulation that can then be mapped by a host language 
compiler onto a single processor. 

In the process of merging, all remote procedure 
calls are in-lined: each slave thread is in-lined in the 
code of the thread that calls it through an RPC state- 
ment. Because of the semantics of RPC communica- 
tion, this transformation does not alter the behavior of 
the original system, provided that care is taken to avoid 
name clashes. The result of merging is a single host lan- 
guage encapsulation that contains a single context, a 
single construct thread, one or more autonomous 
threads, possibly multiple slave threads to service RPC 
requests from external process (not involved in the proc- 
ess merge transformation), possibly multiple RPCs to 
slave threads in external processes (not involved in the 
process merge transformation). 

FIGURE 5 shows the effect of merging the process 
instances (25, 26) in the subsystem process (23) of FIG- 
URE 3. The subsystem process (57) has a CoWare lan- 
guage encapsulation. After merging the instances (58, 
59), we obtain a C language encapsulation (60) which 
is added to the subsystem process. 

The benefit of merging processes is that the in-lin- 
ing transformation eliminates the overhead that accom- 
panies execution of (remote) procedure calls. It further 
reduces the number of concurrent threads and, there- 
fore, the overhead that accompanies the switching be- 
tween threads. Finally, it allows the host language com- 
pilers to optimize over the boundaries of the original 
processes. 

The port and protocol hierarchy provides a clear 
separation between functional and communication be- 
havior. Traditionally, the description of a component 
contains both functional and communication behavior in 
an interleaved way. When such a component has to be 
re-used, in an environment other than it was intended 
for, the designer has to change those parts of the de- 
scription of the component that have to do with commu- 
nication. In CoWare, a component's behavior is de- 
scribed by a process that makes use of RPC to commu- 
nicate with the outside world. Such processes can be 
connected with each other without modifying their de- 
scription (modularity). By using primitive ports and prim- 
itive protocols, the designer concentrates on the func- 
tionality of the system while abstracting from terminals, 
signals, and handshakes. 

Later, when the component is instantiated in a sys- 
tem, the primitive protocol is refined into the best suited 
hierarchical protocol, taking into account the other sys- 
tem components involved. This fixes the timing diagram 
and terminals used to communicate over that port. The 
port containing the hierarchical protocol, is made hier- 
archical to add the required communication behavior 
that implements the timing diagram of the selected hi- 
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erarchical protocol. Again this is achieved without mod- 
ifying the description of any of the processes involved. 

Because of this property it is feasible to construct 
libraries of functional building blocks and libraries of 
communication blocks that are re-usable: they can be 
plugged together without modifying their description. Af- 
ter blocks have been plugged together, any communi- 
cation overhead (chains of remote procedure calls) can 
be removed by in-lining the slave threads that serve the 
RPCs. The result is a description of the component in 
which function and communication are interleaved 
seamlessly and which can be compiled into software or 
hardware as efficiently as a description in the traditional 
design process. 

The above method reduces the amount of protocol 
conversions needed at the system level and allows to 
postpone the selection of the communication protocol 
and its implementation until late in the design process, 
in this way achieving the requirements of "design for re- 
use". The concept of hierarchical protocols is also useful 
to model off-the-shelf components ("re-use of designs"), 
because the timing diagrams according to which a proc- 
essor communicates are abstracted in it. 

The input to the implementation refinement process 
is a functional specification: a CoWare language encap- 
sulation consisting of a number of process instances (i. 
e. host language encapsulations), exhibiting both intra- 
process and inter-process communication behavior In 
a first step, allocation is performed. In this step the 
number and type of processors are selected that will 
serve as the target for implementing the input specifica- 
tion. After allocating the necessary processor resourc- 
es, an assignment step is performed. In this step each 
process instance of the input specification is assigned 
to one of the allocated processors. 

The rest of the implementation path is illustrated in 
FIGURE 6. 

All process instances bound to a single processor 
are merged. This results in a system with a one-to-one 
mapping between (merged) processes and allocated 
processors. In FIGURE 6, all DFL processes (61) are 
merged (62) into a single DFL process (63). 

The host language encapsulation from each 
(merged) process instance now has to be compiled onto 
its processor target using a host language compiler. This 
comprises the following steps: 

(1) The CoWare concepts of autonomous thread, 
slave thread, and shared context are implemented 
on the processor target 

(2) The inter-process communication is implement- 
ed. 

(3) The resulting processors are encapsulated so 
that they can be connected with the rest of the sys- 
tem. 

In step (1) existing (commercial) host language 
compilers are re-used. When such a host language 



compiler does not directly support the CoWare concepts 
of autonomous thread, slave thread, and shared con- 
text, the CoWare environment supports two alterna- 
tives: 

5 

the host language compiler is extended with librar- 
ies that support such concepts (multi-thread li- 
brary); 

software synthesis is performed to translate the 
io host language encapsulation to a description that 
can be handled by the host language compiler. 

In FIGURE 6, the DFL process (63) is compiled with 
the Cathedral compiler (64). The result is a VHDL net 
list (65) of the implementation. 

In step (2) when the process is compiled on a non- 
programmable processor, the implementation of inter- 
process communication comprises the steps of refining 
the primitive ports/protocols into appropriate hierarchi- 
20 cal ports/protocols and merging the hierarchical ports 
with the process. In FIGURE 6, hierarchical ports (66) 
are added to the 

VHDL processes (67). After merging (68) all proc- 
esses (67), the resulting VHDL process (69) is compiled 
25 with the Synopsys compiler (70). 

In step (2) when the process is compiled on a pro- 
grammable processor, the implementation of inter-proc- 
ess communication comprises the steps of generating 
device drivers and of generating hardware interfaces. A 
30 software tool is used to achieve this. In the sequel this 
tool is called SYMPHONY. FIGURE 7 illustrates the tool 
SYMPHONY 

SYMPHONY makes use of a software model (71) 
and hardware model (72) of the target processor and a 
35 library of I/O scenarios (73) for that processor. The hard- 
ware model (72) of the programmable processor core 
consists of a HDL host-language encapsulation that for- 
malizes the information that is available in the hardware 
section of the data sheet of the programmable proces- 
40 sor core. The HDL host-language encapsulation of the 
hardware model is characterized by a behavioral inter- 
face that is conform with the hardware interface of the 
programmable processor. All ports (74) have hierarchi- 
cal protocols: they consist of terminals and a timing di- 
45 agram. The hardware model may also contain a HDL 
description for either a black box, a simulation model, 
or the full description of the processor core. 

The software model (71) of the programmable proc- 
essor core consists of a software host-language encap- 
50 sulation that formalizes the information that is available 
in the software section of the data sheet of the program- 
mable processor. The software host-language encap- 
sulation of the software model is characterized by a be- 
havioral interface that is conform with the software in- 
55 terface of the programmable processor core. All ports 
(75) have primitive protocols. The software model iden- 
tifies, for example, what ports can be used to get data 
in or out of the processor core (memory mapped, co- 
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processor port, ...), what ports can be used as interrupt 
ports and what their characteristics are (interrupt prior- 
ity, maskable interrupt, ...). In additbn the software mod- 
el contains a behavioral description that allows to com- 
pile a software host language encapsulation into ma- 
chine code. For example: functions to manage proces- 
sor specific actions such as installing an interrupt vector, 
enabling/disabling interrupts, etc. In Figure 7, the soft- 
ware model of the ARM-6 RISC processor is shown. A 
number of its ports (75) are shown in Figure 7. The mem- 
ory port mem is modeled as a (bi-directional) slave port. 
This slave port is accessed by the device drivers, by 
means of an RPC , to write/read data to/from the external 
hardware. The slave thread, modeled in the software 
model, attached to the mem slave port translates the 
incoming RPC to a memory access. SYMPHONY 
makes the connection between the device drivers and 
the mem port. The fiq port is modeled as a master port. 
The software model of the ARM processor ensures that 
an RPC to this port is performed, every time the proc- 
essor detects that an interrupt has occurred. SYMPHO- 
NY connects the fiq port to a slave thread that serves 
as the interrupt service routine, so that routine is started 
automatically. 

In figure 7, the hardware model of the ARM-6 RISC 
processor is shown. A number of its ports (74) are 
shown in Figure 7. The memory port mem is modeled 
as a (bi-directional) master port. The hardware model of 
the ARM- performs an RPC to port every time that it 
wants to write/read data from the RAM, ROM or memory 
mapped hardware. SYMPHONY connects the mem port 
to a slave thread in the hardware interface, which does 
the address decoding and forwards the RPC to the ap- 
propriate hardware block (RAM, ROM, memory mapped 
hardware). 

The fiq port is modeled as a slave port in the hard- 
ware model. This slave port is activated by an RPC that 
is performed to the port by the hardware interface. The 
slave thread attached to the fiq port (and modelled in 
the hardware model) sets the appropriate flag in the sta- 
tus register to the appropriate value, signalling the inter- 
rupt request. 

The link between events (RPC to ports, starting of 
slave threads attached to slave ports) in the hardware 
model and events in the software model, is taking care 
of by the processor hardware or, in case of simulation, 
by the instruct ion -set simulator for the ARM. 

SYMPHONY is based on the observation that pro- 
grammable processors have a number of common com- 
munication methods to get data in or out of the proces- 
sors. These communication methods are modeled by I/ 
O scenarios. An I/O scenario describes one way of using 
the ports of a specific processor core to map a particular 
port of a software host language encapsulation to an 
equivalent port in hardware, thereby crossing the proc- 
essor core boundary while maintaining the communica- 
tion semantics. FIGURE 8 shows an example of an I/O 
scenario. It consists of a software host-language encap- 



sulation and a hardware host-language encapsulation 
that describe a software I/O driver and the hardware 
counterpart, respectively. An I/O scenarb is also tagged 
with some performance figures that will allow the de- 
5 signer or SYMPHONY to make a decision about what I/ 
O scenario to use for which port. 

The I/O scenario (78) of FIGURE 8 shows how an 
outmaster port psw (79) in software can be mapped to 
an outmaster port phw (80) in hardware, thereby using 
the memory port (81 ) of an ARM-6 RISC processor core. 
The software process encapsulation P1 (82) represents 
the software I/O driver and copies data from port psw 
(79) to a specific memory address 0x08000 via an RPC 
call (84) . The hardware process encapsulation P2 (83) 
represents the hardware counterpart and checks wheth- 
er the memory address bus of the ARM (modeled by the 
protocol indices of the memory port (81)) equals 
0x08000. If this is the case, data that is residing on the 
memory data bus of the ARM is copied to port phw (80) 
via an RPC call (85). 

The library of I/O scenarios comprises: 

• Memory-Mapped I/O scenarios. These provide a 
data-transfer mechanism that is convenient be- 
cause it does not require the use of special proces- 
sor instructions, and can implement practically as 
many input or output ports as desired. In memory- 
mapped I/O, portions of the address space are as- 
signed to input and output ports. Reads and writes 
to those addresses are interpreted as commands 
to the I/O ports. "Sending" to a memory-mapped lo- 
cation involves effectively executing a "Store" in- 
struction on a pseudo-memory location connected 
to an output port, and "Receiving" from a memory- 
mapped location involves effectively executing a 
"Load" instruction on a pseudo-memory location 
connected to an input port. When these memory op- 
erations are executed on the portions of address 
space assigned to memory-mapped I/O, the mem- 
ory system ignores the operation. The I/O unit, how- 
ever, sees the operation and performs the corre- 
sponding operation to the connected I/O ports. The 
number of memory locations assigned for memory- 
mapped I/O will depend on the number of ports that 
a software processor component has to "physically" 
implement. SYMPHONY proposes an assignment 
of address locations to channels that will result in 
simple address decoding logic. However, the user 
can always override the proposed assignment. 

• Instruction-Programmed I/O scenarios. Some proc- 
essors also provide special instructions for access- 
ing special I/O ports provided with the processor it- 
self. Using this scheme, these special communica- 
tion ports of the processor are connected to the ex- 
ternal channels via the I/O unit. In addition to pro- 
viding hardware support for memory-mapped and 
instruction-programmed I/O, the I/O unit also pro- 
vides support for hardware interrupt control. Inter- 
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rupts are used for different purposes, including the 
coordination of interrupt-driven I/O transfers. Differ- 
ent processors provide different degree of hardware 
interrupt support. Someprocessors provide direct 
access to a number of dedicated interrupt signals. 
Our I/O unit architecture makes use of these signals 
when available. If more interrupt "channels" are re- 
quired, as for example required to support a number 
of interrupt-driven communication channels, we 
use the strategy of interrupt vectors. Interrupt vec- 
tors are pointersor addresses that tell the processor 
core where to jump to for the interrupt service rou- 
tine. In effect, this is a kind of memory-mapped in- 
terrupt handling. 

Once an I/O scenario is selected for every port of 
the software host-language encapsulation, SYM- 
PHONY generates the necessary communication 
software and the corresponding hardware I/O unit 
by combining the selected I/O scenarios. The gen- 
erated communication software, the software mod- 
el of the processor core and the software host -lan- 
guage encapsulation itself are merged and com- 
piled with the processor specific C-compiler. 
The result of SYMPHONY is a refinement of the 
original host language encapsulation into a CoWare 
encapsulation of which the behavioral interface is 
identical to that of the original encapsulation. 

In addition SYMPHONY, adds RAM and ROM 
blocks to store the program code and data (In figure 7, 
the RAM and ROM are not shown explicitly : they are 
part of the HW interface). The result of SYMPHONY is 
a refinement of the original host language encapsulation 
into a CoWare encapsulation of which the behavioral in- 
terface is identical to that of the original encapsulation. 
SYMPHONY effectively replaces a software encapsula- 
tion by a hardware encapsulation that has equivalent 
functionality. 

In step (3) when two processors are not protocol 
compatible, a protocol conversion process is inserted. 
In FIGURE 6, the processor (65) compiled with Cathe- 
dral-2/3 and the off-the-shelf processor (76) have in- 
compatible protocols. Protocol conversion (77) is re- 
quired to make them compatible. 

A digital system in the CoWare design environment 
can be simulated. Simulation is an implementation of the 
digital system on one or more general-purpose comput- 
ers. The implementation process outlined above can be 
followed to construct a simulation. FIGURE 9 illustrates 
the construction of a simulation. For simulation the tar- 
get processors are simulators (86) running on process- 
es (87) of the operating systems (88) that run on the 
general-purpose computers. Allocation and assignment 
determine the simulation architecture. Arbitrary simula- 
tion architectures are supported by the CoWare design 
environment. Support is provided to select an optimal 
architecture for a given simulation speed and debugging 
visibility. 



The host language compilers mentioned in step (1 ) 
are now the simulation engines for the host languages. 

In step (2), the inter-process communication now 
consists of two parts. In a first part the communication 
5 is realized from the simulation engine to the OS process 
on which the simulation engine is running. In a second 
part the communication is realized between two differ- 
ent OS processes over the OS and network layer. The 
communication between the simulation engine and the 
*0 OS process is performed via the application program- 
mers interface of the simulation engine. The communi- 
cation between two different OS processes is done 
through the OS inter-process communication primitives 
(e.g. shared memory and semaphores for two process- 
es es on a single OS, or TCP/IP sockets for two processes 
on distinct computers). 

When the simulation engine used has a fixed inter- 
face as for example an instruction set simulator for a 
programmable processor, then the hardware software 
20 interface is generated with SYMPHONY and can be 
simulated as any other process. 

The CoWare design environment supports multi ab- 
straction level simulation which is the key for efficient 
co-simulation. It allows to simulate the processes under 
25 debug at an appropriate low level of abstraction for de- 
bugging purposes, while simulating the other processes 
in the system at the highest appropriate abstraction level 
for maximal speed. The time-consuming low abstraction 
level simulation is limited to the smallest possible part 
30 of the system under simulation, while still being able to 
simulate these parts in the system context. 

Because both simulation and implementation follow 
the same design process, it is possible to construct hy- 
brid simulation architectures in which part of the system 
35 is implemented by simulators running on OS processes 
and part of the system is implemented by actual hard- 
ware. This is just one more manifestation of the hetero- 
geneity of digital system architectures. 

In a specific embodiment of the present invention 
40 an application is disclosed in the sequel for hardware/ 
software co-design of a pager application. 

SPECIFICATION OF THE PAGER 

45 Each block (89) in FIGURE 10 corresponds to a 
process implementing a specific function of the pager. 
This functional decomposition determines the initial par- 
titioning. The arrows (90)in between the processes rep- 
resent primitive channels with RPC semantics. 
50 FIGURE 1 1 shows the RPC communication in detail 
for part of the pager design. The blocks (91 , 92, 93, 94, 
95, 96, 97, 98, 99, 100, 101) in FIGURE 11 correspond 
to the processes (89) from FIGURE 10. 

The Sample Clock Generator process (94) contains 
55 an autonomous thread (102). This thread runs continu- 
ously. It performs an RPC (103) over its input port ip 
(104) to the Tracking & Acquisition process (93) to ob- 
tain a new value for delta. The autonomous thread (102) 
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of the process (94) adds the delta parameter to some 
internal variable until a threshold is exceeded. In this 
way it implements a sawtooth function. When the saw- 
tooth exceeds the (fixed) threshold an RPC call (105)is 
issued to the A/D converter process (95). The autono- 
mous thread (102) of the Sample Clock Generator (94) 
performs an RPC (1 05) (gives a sample clock tick) every 
threshold/delta iterations (real clock cycles). 

The slave thread clock (106) in the A/D converter 
process (95) samples the analogue input, and sends the 
result to the Down -conversion process (100) via an RPC 
call (107). This in turn will activate the Decimation proc- 
ess (99) via an RPC call, etc. 

The Correlator Noise Estimator process (98) con- 
tains a slave thread (108) associated with port ip (109) 
to compute the correlation values. This slave thread 
(108) is activated when the Phase Correction process 
(97) writes data to the Correlator Noise Estimator proc- 
ess (98) (i.e. when the Phase Correction process (97) 
performs an RPC (110) to the ip (109) port of the Cor- 
relator Noise Estimator process (98)). The slave thread 
(108) reads in the data and then performs an RPC (111) 
to the User Interface process (91 ) to obtain a new value 
for the parameter par it requires for computing the cor- 
relation values. Finally, the new correlation results are 
sent to the Tracking Acquisition process (93) via an RPC 
call (11 2) on its op port (113). 

The slave thread (114) in the Tracking Acquisition 
process (93) updates the delta value for the sawtooth 
function implemented by the Sample Clock Generator 
process (94). It puts the updated value in the context 
(115), where it is retrieved by the slave thread op (116) 
which serves RPC requests from the Sample Clock 
Generator process (94). In this way the Tracking Acqui- 
sition process (93) influences the frequency of the clock 
generated by the Sample Clock Generator process (94). 
This example shows how the context (115) is used for 
communication between threads inside the same proc- 
ess whereas the RPC mechanism is used for commu- 
nication between threads in different processes. The 
locking (1 1 7) and unlocking (1 1 8) of the context ( 1 1 5) is 
required to avoid concurrent accesses to the variable 
delta. The lock (117) in the slave thread op (116) locks 
the context (115) for read: other threads are still allowed 
to read from the context (115), but no other thread may 
write the context (115). The lock ( 1 1 9) in the slave thread 
ip (114) locks the context (115) for write: no other thread 
is allowed to write or read the context (115) until it is 
unlocked again. 

Each process is described in the language that is 
best fit for the characteristics of the function it imple- 
ments The data-flow blocks (NCO (101), Down-conver- 
sion (100), Decimation (99), Chip Matched Filter (96), 
Phase Correction (97), Correlator Noise Estimator (98), 
and Sample Clock Generator (94)) are described in 
DFL. The control oriented blocks (Tracking Acquisition 
(93), Frame Extraction (92) and User Interface (91)) are 
described in C. The code in FIGURE 11 , is pseudo-code 



meant for illustration and does not correspond to the ac- 
tual code. 

DESIGN PROCESS 

5 

After the initial specification of the system has been 
validated by simulation, the designer starts the refine- 
ment process. 

At this moment it is not yet decided what process 

10 will be implemented on what kind of target processor 
nor is it defined how the RPC communication will be re- 
fined. However, the choice of the specification language 
for each process restricts the choice of the component 
compiler and in that sense partly determines the target 

* s processor. Hence, studying possible alternative assign- 
ments of a process to a target processor may require 
the availability of a description of the process in more 
than one specification language or a clear guess of the 
best solution. 

20 

ALLOCATION AND ASSIGNMENT 

This step determines what processes will be imple- 
mented on what target processor. The initial specifica- 

25 tion shows the finest grain partitioning: a process in the 
initial specification will never be split over several proc- 
essors. However, it may be worthwhile to combine a 
number of processes inside a single processor. This is 
achieved by merging these processes into a single proc- 

30 ess that can then be mapped on the selected target 
processor by a host language compiler. Merging of proc- 
esses is only allowed when the processes are described 
in the same specification language. Hence, studying 
possible alternative mergers may require that for a 

35 number of processes (e.g. Correlator Noise Estimator 
process) a description is available in more than one 
specification language. After allocation and assignment, 
one obtains a description with a one-to-one mapping of 
merged processes to processors. 

40 in the pager example (FIGURE 1 2) the following al- 
location, merging and assignment takes place. 

The NCO (120), Down-conversion (121), and Dec- 
imation (122) processes are merged and mapped in 
hardware onto an application specific DSP processor 

45 (1 23) because the sample rate of the merged processes 
is identical which implies that they can be clocked at the 
same frequency. The advantage is that only one clock 
tree needs to be generated per merged process (i.s.o. 
one per original process). An additional advantage is 

so that the scan-chains for the processes that are merged 
can be combined. 

The Chip Matched Filter (124), and Phase Correc- 
tion (125) processes are merged and mapped onto a 
CATHEDRAL-3 processor (126) because their sample 

55 rates are identical. 

The Correlator Noise Estimator process (127) is 
mapped onto a CATHEDRAL-3 processor (1 28). It is not 
merged with the Phase Correction process (125) be- 
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cause it operates at a four times lower frequency. 

The Sample clock generator (129) is mapped onto 
a CATHEDRAL-3 processor (130). 

Tracking Acquisition (131), Frame Extraction (132), 
and User Interface (1 33) are merged and mapped on a 
programmable processor (134). For this design an 
ARM6 processor is chosen. 

The Hardware/Software tradeoffs are based on the 
following observations. To obtain a maximal degree of 
flexibility as much of the functionality as possible is im- 
plemented in software on the ARM6 (134). However, 
due to performance constraints of the ARM6 processor 
(1 34), there is a limit to what can be implemented in soft- 
ware. The two main factors that play a role in this prob- 
lem are The Tracking Acquisition process (131) has to 
be implemented in software because the algorithm used 
to perform tracking and acquisition may be modified de- 
pending on the application domain of the pager system. 

The Correlator Noise Estimator process (127) is not 
included in software because the input rate for the Cor- 
relator Noise Estimator (1 27) is too high to realize a real- 
time communication between the ARM6 and the Phase 
Correction process (125). In addition an estimation of 
the number of cycles required to execute each function 
on the ARM6 shows that the implementation of Corre- 
lator Noise Estimator process (1 27) in software leaves 
insufficient time to perform tracking and acquisition in 
between every two symbols. 

After merging, each of the merged processes can 
now be implemented on a separate target processor by 
the appropriate compiler. The communication between 
the merged processes is still done via primitive ports and 
channels. 

Communication Mechanism Selection 

After the partitioning of the system has been verified 
by simulation and before the actual implementation 
takes place, the designer may choose to refine the com- 
munication mechanism between the processors. This 
can be achieved by making explicit the behavior of the 
channels between the processors. 

In the running example, the processors can, in prin- 
ciple, operate concurrently because each processor has 
its own thread of control. By refining the RPC based 
communication scheme we can pipeline the processors: 
all processors operate concurrently and at I/O points 
they synchronize. This refined communication scheme 
is called Blocked/Un Blocked Read/Write communica- 
tion. FIGURE 1 3 shows the pager with the refined com- 
munication mechanism. The inputs and outputs of the 
processors have been labeled with BW for Blocked 
Write, BR for Blocked Read, and UBR for UnBlocked 
Read. 

BW-BR communication guarantees that no data is 
ever lost. When the writing process has data available, 
it will signal that to the reading process. If the reading 
process is at that moment not ready to receive the data 



(because it is still processing the previous data), the writ- 
ing process will block until the reading process is ready 
to communicate. Alternatively, if the reading process 
needs new data, it will signal that to the writing process. 

5 If the writing process is at that moment not ready to send 
the data (because it is still computing the data), the read- 
ing process will block until the writing process is ready 
to communicate. The BW-BR scheme is used in the 
main signal path. 

A BW-BR scheme, however, is not used for the pa- 
rameter and mode setting for the main signal path. If an 
accelerator uses BR to read a parameter value it will be 
blocked until the parameter is provided. Since the pa- 
rameter setting is done in software, this will slow down 

?5 the computations in the main signal path considerably. 
Therefore parameter setting is done via a BW-UBR 
scheme. This makes sure that every parameter change 
is read by the accelerators, but it leaves it up to the ac- 
celerator to decide when to read the parameter. 

20 in the CoWare design environment the refinement 
of the communication mechanism is performed by mak- 
ing use of a hierarchical channel. A hierarchical channel 
replaces a primitive channel by a process that describes 
how communication over that channel is carried out. 

25 The introduction of BW-BR communication is 
shown in detail for the Chip Matched Filter & Phase Cor- 
rection (135) and Correlator & Noise Estimator process 
(136) in FIGURE 14. 

The BWBR channel (137) contains a autonomous 

30 thread (1 38) and a slave thread (1 39) that communicate 
with each other via the shared variable tmp[0..7] in the 
context (1 40). The slave thread (1 39) is activated by an 
RPC (141) from the CMF & Phase Correction process 

(135) and it tries to update the context (140) with new 
35 values. The autonomous thread (1 38) continuously tries 

to read the values from the context (1 40) and send them 
to the output port (144) via an RPC (142), in this way 
activating the Correlation & Noise Estimator process 

(1 36) that is attached to that output (1 44). The blocking 
40 character of the communication is taken care of by the 

use of a binary semaphore rw (143). This guarantees 
that the input thread (139) will block until the previous 
data has been read by the autonomous thread (1 38) (no 
data is overwritten before it has been read), and that the 

45 autonomous thread (138) will block until new data is 
available (no data is read twice). When the input slave 
thread (139) is blocked, the CMF & Phase Correction 
process (135) that requested its service via an RPC 
(141) is also blocked because the RPC (141) will only 

50 return after the slave thread ( 1 39) has completed. When 
the autonomous thread (138) is blocked, there are no 
RPC requests to the Correlator Noise Estimator process 
(136), so that process (136) is blocked automatically. 
In the case of Blocked-Write, UnBlocked Readcom- 

55 munication, the code for the autonomous thread is 
slightly modified. The thread always sends the value 
stored in the context, without checking whether it is up- 
dated. The same value can be sent more than once, but 
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the thread will never be blocked. The input slave thread 
is identical to the BWBR case, and will block until the 
data has been read. 

In both cases, locking and unlocking of the context 
is required to avoid concurrent accesses to the shared 5 
variable in the context and, as such, has nothing to do 
with the blocking character of the communication. 

Implementation of the pager 

After the newly introduced communication mecha- 
nism has been verified by simulation, each process has 
to be synthesized on its assigned target processor. 

Implementation of a Process in Hardware 

FIGURE 15 illustrates the pure hardware imple- 
mentation for the Correlator & Noise Estimator process 
(145) and the merged Phase Correction and Chip 
Matched Filter process (146). 

This hardware implementation for the pager con- 
sists of three distinct steps: 

The (merged) DFL processes are synthesized by 
the CATHEDRAL silicon compiler. The compiler gener- 
ates processors of which all the inputs and outputs are 
of the master type. These processors are shown in FIG- 
URE 15 as the inner rectangles (147, 148). 

Each processor is encapsulated to make it consist- 
ent with the specification in which the DFL processes 
have slave inputs. In addition, the encapsulation in- 
cludes clock gating circuitry to control the activity of the 
processor. The encapsulated processors are shown in 
FIGURE 15 as the big rectangles (149, 150) : they in- 
clude the processor generated by CATHEDRAL (147, 
1 48) and some encapsulation hardware (151,1 52, 1 53). 
As can be observed the input ports (154, 155) of the 
encapsulated processors (149, 150) are now of the 
slave type. The encapsulation hardware (151, 152, 153) 
is shown in detail in FIGURE 16 as the blocks (155, 156, 
157). 

The BWBR process is implemented in hardware. In 
this case we obtain the gate-level implementation of this 
process from the library. This implementation is func- 
tionally equivalent to the original C-like description (1 37) 
of this block in FIGURE 14. FIGURE 16 shows the de- 
tailed implementation (158) of the BWBR process that 
is used in the main signal path of the pager. 

Implementation of a Process In Software 

To simplify the discussion we will only look at the 
transfer of the 1 4 correlation values to the Tracking Ac- 
quisition process, and the setting of a parameter value 
by the User interface. We also know that the transfer of 
the correlation values has to be BWBR and the transfer 
of the parameter value has to be BWUBR. 

The hardware interface and the software I/O device 
driver is generated automatically with SYMPHONY To 



generate these interfaces SYMPHONY analyses the 
ports of the software process. For each of these ports, 
SYMPHONY scans the library of I/O scenarios for an 
applicable scenario. The user is asked to select the most 
appropriate scenario amongst the applicable ones. 
SYMPHONY then combines the selected scenarios into 
a software I/O device driver and a hardware interface. 

In the example (FIGURE 17) there are two ports to 
be implemented: 

bool[32] [14] corn inslave (159) is used to transfer 
the correlation values. This is a port of type inslave, 
that transports an array of 14 bit vectors of size 32. 
bool par: outmaster ( 1 60) is used to set a parameter 
of the correlation block. This is a port of type out- 
master, that transports a boolean value. 

For the corr port (159) SYMPHONY proposes the 
scenario depicted in FIGURE 17. The memory port of 
the ARM will be used to transfer the correlation values 
and the FIQ port of the processor will be used to initiate 
the transfer. The I/O scenario describes what blocks 
need to be inserted in software and hardware to realize 
this kind of communication. In total three hardware and 
three software blocks are required to implement the 
communication over the corr port. Unpack (161). The 
memory port of the ARM is obviously not wide enough 
to transfer the 14 correlation values in parallel. There- 
fore, the scenario will sequentialize the transfer. Of the 
14 correlation values 13 will be stored internally in the 
Unpack block (161). The 14th value is sent to the Split 
block (162). 

Split (1 62) stores the 1 4th value internally and then 
activates the FIQ port of the ARM processor. Activating 
the FIQ port (163) of the hardware model (164), has as 
a consequence that an RPC is issued on the interrupt 
port (165) of the software model (166). This port (165) 
is connected with the Join (167) block. 

Join (167) retrieves the 14th correlation value by is- 
suing an RPC to the corresponding input port (168) of 
the Demux block (169). Demux (169). The data transfer 
is implemented through memory mapped I/O. There- 
fore, when selecting this I/O scenario, the user should 
decide on the address that will be used for the transfer. 
When one of the input ports of the Demux block is acti- 
vated an RPC to the memory port (170) will be per- 
formed with an address that corresponds to the activat- 
ed input port. 

Mux (171). At the hardware side the memory port 
(172) issues an RPC to the Mux (171) block whenever 
it (172) is activated. In that block (171), the address will 
be decoded and the corresponding output port will be 
activated to retrieve the correlation value that was 
stored locally in hardware (either in the Split (162) or in 
the Unpack block (161)) Pack (173). After the 14th cor- 
relation value has been retrieved by the Join block (1 67), 
it is passed on to the Pack block (173), that will then 
retrieve the 13 other correlation values by issuing con- 
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secutive RPCs to the different ports of the Mux block 
(169), Finally, when the Pack block (173) has retrieved 
all 1 4 values, it packs them in an array and activates the 
original software application code in the Tracking & Ac- 
quisition process (174). 5 

All these blocks are described in a generic way in a 
library where they can be retrieved and customized by 
SYMPHONY. 

The solution for the par port (160) is much simpler. 
Since it is an outmaster port it can directly be mapped 
on the memory port. However, since the memory port is 
already used, an extra multiplexer (1 75) is required. This 
is shown in FIGURE 17. To implement the unblocking 
read character of the transfer an extra register (176) is 
required on the hardware side. 

Before going on with the implementation path, all 
processes that were added by SYMPHONY are 
merged. The hardware interfaces are merged into one 
hardware interface block that can then be implemented 
with RT-level synthesis tools. The I/O device driver proc- 
esses are merged with the original S W application code. 
As a consequence of the in-lining, the complete tracking 
and acquisition slave thread moves in the interrupt rou- 
tine. Whenever new correlation values are ready, the 
main software thread is interrupted to run the tracking 
and acquisition algorithm. After that interrupt is proc- 
essed, the main thread resumes. 

In the above description a design environment and 
a design methodology meeting the requirements of 
modularity, encapsulation of different description lan- 
guages, modeling from a heterogeneous conceptual 
specification to a resulting heterogeneous architecture 
and all refinement steps inbetween, modeling capabili- 
ties for off-the-shelf components and the associated de- 
sign environments, separation between functional and 
communication behavior and processor independent in- 
terface synthesis have been disclosed. Yet it is apparent 
that other embodiments of the present invention may be 
obvious to the person skilled in the art, the spirit and 
scope of the present invention being limited only by the 
terms of the appended claims. 



Claims 

1. A design environment for implementing an hetero- 
geneous essentially digital system, comprising: 

a database compiled on a computer, adapted 
for access by executable programs on said 
computer for generating the implementation of 
said heterogeneous essentially digital system, 
comprising a plurality of objects representing 
aspects of said digital system wherein said ob- 
jects comprise primitive objects representing 
the specification of said digital system and hi- 
erarchical objects being created by said exe- 
cutable programs while generating the imple- 



mentation of said digital system, said hierarchi- 
cal objects being refinements of said primitive 
objects and having more detail and preserving 
any one or all of said aspects to thereby gen- 
erate said implementation of said digital sys- 
tem; and further comprising relations inbe- 
tween said primitive objects and inbetween 
said hierarchical objects and between said 
primitive objects and said hierarchical objects ; 
and further comprising functions for manipulat- 
ing said objects and said relations; 
means for specifying said heterogeneous dig- 
ital system comprising a plurality of behavioral 
and structural languages; 
means for simulating said heterogeneous dig- 
ital system comprising a plurality of simulators 
for said behavioral and structural languages; 
means for implementing said heterogeneous 
digital system comprising a plurality of compil- 
ers for said behavioral and structural languag- 
es; 

means for allocating hardware components for 
an implementation of said heterogeneous dig- 
ital system; 

means for assigning hardware subsystems and 
software subsystems of said heterogeneous 
digital system to said hardware components; 
means for implementing the communication 
between said software subsystems and said 
hardware subsystems, one of the aspects of 
said communication being represented by 
ports; 

means for encapsulating said simulators, said 
compilers, said hardware components, said 
hardware subsystems and said software sub- 
systems whereby creating a consistent com- 
munication between said encapsulated simula- 
tors, compilers, hardware components, hard- 
ware subsystems and software subsystems; 
and 

means for creating processor models of said 
hardware components as objects in said data- 
base, said models comprising software models 
representing the software views on said hard- 
ware components and hardware models repre- 
senting the hardware views on said hardware 
components. 

The design environment as recited in claim 1 further 
comprising means for creating I/O scenario models 
of said ports as objects in said database, said I/O 
scenario models representing the implementation 
of said ports on said hardware components, said 
implementation comprising software subsystems, 
hardware subsystems, and processor models with 
connections therebetween. 

The design environment as recited in claim 2 
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wherein the implementation of the communication 
between a first software subsystem and a first hard- 
ware subsystem results in said first software sub- 
system with a first port being replaced by a second 
hardware subsystem with a second port, said first 5 
port and said second port representing an essen- 
tially identical communication. 

4. The design environment as recited in claim 3 further 
comprising: 10 

means for selecting I/O scenario models for the 
ports of said first software subsystem; 
means for combining the software subsystems 
of said selected I/O scenarios; ?5 
means for combining the hardware subsystems 
of said selected I/O scenarios. 

5. The design environment as recited in claim 4 
wherein a first I/O scenario model represents the 20 
connection of said first port to said second port, said 
connection comprising a connection of said first port 

to said software subsystems of said I/O scenario 
model, connections of said software subsystems of 
said I/O scenario model to said software model, 25 
connections of said hardware model to said hard- 
ware subsystems of said I/O scenario model, and a 
connection of said hardware subsystems of said I/ 
O scenario model to said second port. 

30 

6. The design environment as recited in claim 5 
wherein I/O scenario models comprise memory 
mapped I/O scenarios, instruction programmed I/O 
scenarios, interrupt based I/O scenarios. 

35 

7. The design environment as recited in claim 1 
wherein said implementation is a simulation of said 
digital system. 

8. The design environment as recited in claim 7 40 
wherein said simulation is a multi-platform simula- 
tion being executed on a plurality of computers. 

9. The design environment as recited in claim 8 
wherein said simulation is a hybrid simulation com- 45 
prising substantially simultaneous hardware imple- 
mentations and computer simulations. 

10. The design environment as recited in claim 1 
wherein said implementation is a heterogeneous 50 
implementation comprising hardware subsystems 
and software subsystems, said software subsys- 
tems being executed on one or more ol said hard- 
ware subsystems. 

55 

11. The design environment as recited in claim 10 
wherein said hardware subsystems comprise any 
one or more of processor cores, off-the-shelf com- 



ponents, custom components, ASICs, processors, 
or boards. 

12. The design environment as recited in claim 10 
wherein said aspects are functional or communica- 
tion or concurrency or structural aspects of said dig- 
ital system. 

13. A method of making an implementation of a heter- 
ogeneous essentially digital system, said imple- 
mentation comprising hardware and software sub- 
systems of said system, said software subsystem 
being executed on one or more of said hardware 
subsystems, comprising the steps of; 

defining a first set of primitive objects repre- 
senting the specification of said digital system, 
comprising the steps of: 
describing the specification of said system in 
one or more processes, each process repre- 
senting a functional aspect of said system, said 
processes being primitive objects; 
defining ports and connecting said ports with 
channels, said ports structuring the communi- 
cation between said processes, said ports and 
said channels being primitive objects, one proc- 
ess having one or more ports; 
defining the communication semantics of said 
ports by a protocol, said protocol being a prim- 
itive object; 
and thereafter 

creating hierarchical objects being refinements 
of said primitive objects and having more detail, 
while preserving aspects of said communica- 
tion semantics; 

allocating one or more hardware components, 
said components comprising programmable 
processors and non-programmable proces- 
sors; 

assigning said processes to said hardware 
components, the processes being assigned to 
a programmable processor being a software 
subsystem, the other processes being hard- 
ware subsystems. 

selecting I/O scenario models for the ports of 
said software subsystem thereby connecting 
said ports to the interface of said programmable 
processor and connecting the interface of said 
programmable processor to second ports, said 
second ports representing an essentially iden- 
tical communication as said ports. 

14. The method as recited in claim 1 3 further compris- 
ing the step of simulating said system. 

15. The method as recited in claim 13 wherein said 
hardware subsystems comprise any one or more of 
processor cores, off-the-shelf components, custom 
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components, ASICs, processors, or boards. 

16. The method as recited in claim 15 turther compris- 
ing the step of refining the channel tnbetween a first 
and a second port of respectively a first and a sec- s 
ond hardware component, said first and said sec- 
ond port having an incompatible protocol, thereby 
creating a hierarchical channel, said hierarchical 
channel converting the first protocol into the second 
protocol. 10 

17. The method as recited in claim 16 further compris- 
ing the step of refining the channels inbetween in- 
compatible ports of hardware components, thereby 
creating hierarchical channels. is 

18. The method as recited in claim 17 further compris- 
ing the step of generating a netlist comprising the 
layout information of said implementation. 
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